Causal Inference 3 Tutorial

Author
Affiliation

Jeremy Springman

University of Pennsylvania

Published

February 8, 2024

Confounders

  • Something that affects both the probability of receiving the treatment and the value of the outcome
  • When there are confounders, we can’t know what portion of the difference between treatment and control group that is caused by the treatment vs something correlated with the probability of receiving the treatment
  • To account for confounders, we can control for them in our model

set.seed(1234568)

## Confounders: generating data according to a confounder structure
n = 1000 # sample size

# Generate a treatment variable
x1 = rnorm(n, mean = 1, sd = 0.1)

# look at the data
ggplot() +
  geom_histogram(aes(x = x1), bins = 30, fill = "skyblue", color = "black") +
  geom_vline(xintercept = mean(x1), linetype = "dashed", color = "red", size = 1) +
  labs(x = "x1", y = "Frequency", title = "Histogram of x1") +
  annotate("text", x = mean(x1) + 0.04, y = 100, label = paste0("Mean = ",round(mean(x1), digits = 0)), color = "red", size = 4)

# Generate a confounder
z = rnorm(n, mean = 3, sd = 0.3)

# Generate an outcome variable
y = 2*x1 + 3*z + rnorm(n, mean = 0, sd = 1)

# create a table showing estimates of x1 with and without x2
modelsummary::modelsummary(
  list(lm(y ~ x1), lm(y ~ x1 + z)),
  estimate  = "{estimate}{stars} ({std.error})",
             statistic = NULL,
  gof_omit = 'IC|RMSE|Log|F|R2$|Std.')
(1) (2)
(Intercept) 8.739*** (0.406) 0.880* (0.429)
x1 2.294*** (0.404) 1.915*** (0.309)
z 2.746*** (0.103)
Num.Obs. 1000 1000
R2 Adj. 0.030 0.431

Colliders

[1] -0.007189847
(1) (2)
(Intercept) 3.009*** (0.031) 0.618*** (0.040)
x1 -0.007 (0.031) -0.790*** (0.018)
x2 0.397*** (0.006)
Num.Obs. 1000 1000
R2 Adj. -0.001 0.804

Post-Treatment Mechanism Bias

set.seed(2233)


## Confounders: generating data according to a confounder structure
n = 1000 # sample size

# Generate a treatment variable
x1 = rbinom(n, 1, 0.5)

# Generate a mechanism
mechanism = rnorm(n, mean = x1 * 2, sd = 1)

# Generate an outcome variable
y =  0.5 * x1 + .7 * mechanism + rnorm(n)

# create a table showing estimates of x1 with and without x2
modelsummary::modelsummary(
  list(lm(y ~ x1), lm(y ~ x1 + mechanism)),
  estimate  = "{estimate}{stars} ({std.error})",
             statistic = NULL,
  gof_omit = 'IC|RMSE|Log|F|R2$|Std.')
(1) (2)
(Intercept) -0.033 (0.054) -0.025 (0.043)
x1 1.956*** (0.075) 0.517*** (0.087)
mechanism 0.704*** (0.030)
Num.Obs. 1000 1000
R2 Adj. 0.402 0.612

Reverse Causality